Pesquisa | Portal Regional da BVS

1.

Diverse Taxonomies for Diverse Chemistries: Enhanced Representation of Natural Product Metabolism in UniProtKB.

Feuermann, Marc; Boutet, Emmanuel; Morgat, Anne; Axelsen, Kristian B; Bansal, Parit; Bolleman, Jerven; de Castro, Edouard; Coudert, Elisabeth; Gasteiger, Elisabeth; Géhant, Sébastien; Lieberherr, Damien; Lombardot, Thierry; Neto, Teresa B; Pedruzzi, Ivo; Poux, Sylvain; Pozzato, Monica; Redaschi, Nicole; Bridge, Alan.

Metabolites ; 11(1)2021 Jan 12.

Artigo em Inglês | MEDLINE | ID: mdl-33445429

RESUMO

The UniProt Knowledgebase UniProtKB is a comprehensive, high-quality, and freely accessible resource of protein sequences and functional annotation that covers genomes and proteomes from tens of thousands of taxa, including a broad range of plants and microorganisms producing natural products of medical, nutritional, and agronomical interest. Here we describe work that enhances the utility of UniProtKB as a support for both the study of natural products and for their discovery. The foundation of this work is an improved representation of natural product metabolism in UniProtKB using Rhea, an expert-curated knowledgebase of biochemical reactions, that is built on the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Knowledge of natural products and precursors is captured in ChEBI, enzyme-catalyzed reactions in Rhea, and enzymes in UniProtKB/Swiss-Prot, thereby linking chemical structure data directly to protein knowledge. We provide a practical demonstration of how users can search UniProtKB for protein knowledge relevant to natural products through interactive or programmatic queries using metabolite names and synonyms, chemical identifiers, chemical classes, and chemical structures and show how to federate UniProtKB with other data and knowledge resources and tools using semantic web technologies such as RDF and SPARQL. All UniProtKB data are freely available for download in a broad range of formats for users to further mine or exploit as an annotation source, to enrich other natural product datasets and databases.

2.

Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns.

Wood, Valerie; Carbon, Seth; Harris, Midori A; Lock, Antonia; Engel, Stacia R; Hill, David P; Van Auken, Kimberly; Attrill, Helen; Feuermann, Marc; Gaudet, Pascale; Lovering, Ruth C; Poux, Sylvain; Rutherford, Kim M; Mungall, Christopher J.

Open Biol ; 10(9): 200149, 2020 09.

Artigo em Inglês | MEDLINE | ID: mdl-32875947

RESUMO

Biological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes probably reflects errors in literature curation, ontology structure or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g. amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 52 700 automatically propagated annotations across all taxa.

Assuntos

Biologia Computacional/métodos , Ontologia Genética , Anotação de Sequência Molecular , Bases de Dados Genéticas , Evolução Molecular , Genoma Fúngico , Genômica/métodos , Controle de Qualidade , Schizosaccharomyces/genética , Navegador , Fluxo de Trabalho

3.

A Coordinated Approach by Public Domain Bioinformatics Resources to Aid the Fight Against Alzheimer's Disease Through Expert Curation of Key Protein Targets.

Breuza, Lionel; Arighi, Cecilia N; Argoud-Puy, Ghislaine; Casals-Casas, Cristina; Estreicher, Anne; Famiglietti, Maria Livia; Georghiou, George; Gos, Arnaud; Gruaz-Gumowski, Nadine; Hinz, Ursula; Hyka-Nouspikel, Nevila; Kramarz, Barbara; Lovering, Ruth C; Lussi, Yvonne; Magrane, Michele; Masson, Patrick; Perfetto, Livia; Poux, Sylvain; Rodriguez-Lopez, Milagros; Stoeckert, Christian; Sundaram, Shyamala; Wang, Li-San; Wu, Elizabeth; Orchard, Sandra.

J Alzheimers Dis ; 77(1): 257-273, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32716361

RESUMO

BACKGROUND: The analysis and interpretation of data generated from patient-derived clinical samples relies on access to high-quality bioinformatics resources. These are maintained and updated by expert curators extracting knowledge from unstructured biological data described in free-text journal articles and converting this into more structured, computationally-accessible forms. This enables analyses such as functional enrichment of sets of genes/proteins using the Gene Ontology, and makes the searching of data more productive by managing issues such as gene/protein name synonyms, identifier mapping, and data quality. OBJECTIVE: To undertake a coordinated annotation update of key public-domain resources to better support Alzheimer's disease research. METHODS: We have systematically identified target proteins critical to disease process, in part by accessing informed input from the clinical research community. RESULTS: Data from 954 papers have been added to the UniProtKB, Gene Ontology, and the International Molecular Exchange Consortium (IMEx) databases, with 299 human proteins and 279 orthologs updated in UniProtKB. 745 binary interactions were added to the IMEx human molecular interaction dataset. CONCLUSION: This represents a significant enhancement in the expert curated data pertinent to Alzheimer's disease available in a number of biomedical databases. Relevant protein entries have been updated in UniProtKB and concomitantly in the Gene Ontology. Molecular interaction networks have been significantly extended in the IMEx Consortium dataset and a set of reference protein complexes created. All the resources described are open-source and freely available to the research community and we provide examples of how these data could be exploited by researchers.

Assuntos

Doença de Alzheimer/genética , Biologia Computacional/métodos , Bases de Dados de Proteínas , Sistemas Especialistas , Mapas de Interação de Proteínas/genética , Setor Público , Doença de Alzheimer/diagnóstico , Humanos

4.

Enzyme annotation in UniProtKB using Rhea.

Morgat, Anne; Lombardot, Thierry; Coudert, Elisabeth; Axelsen, Kristian; Neto, Teresa Batista; Gehant, Sebastien; Bansal, Parit; Bolleman, Jerven; Gasteiger, Elisabeth; de Castro, Edouard; Baratin, Delphine; Pozzato, Monica; Xenarios, Ioannis; Poux, Sylvain; Redaschi, Nicole; Bridge, Alan.

Bioinformatics ; 36(6): 1896-1901, 2020 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-31688925

RESUMO

MOTIVATION: To provide high quality computationally tractable enzyme annotation in UniProtKB using Rhea, a comprehensive expert-curated knowledgebase of biochemical reactions which describes reaction participants using the ChEBI (Chemical Entities of Biological Interest) ontology. RESULTS: We replaced existing textual descriptions of biochemical reactions in UniProtKB with their equivalents from Rhea, which is now the standard for annotation of enzymatic reactions in UniProtKB. We developed improved search and query facilities for the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that Rhea and ChEBI provide. AVAILABILITY AND IMPLEMENTATION: UniProtKB at https://www.uniprot.org; UniProt REST API at https://www.uniprot.org/help/api; UniProt SPARQL endpoint at https://sparql.uniprot.org/; Rhea at https://www.rhea-db.org.

Assuntos

Reiformes , Animais , Bases de Dados de Proteínas , Bases de Conhecimento

5.

Annotation of gene product function from high-throughput studies using the Gene Ontology.

Attrill, Helen; Gaudet, Pascale; Huntley, Rachael P; Lovering, Ruth C; Engel, Stacia R; Poux, Sylvain; Van Auken, Kimberly M; Georghiou, George; Chibucos, Marcus C; Berardini, Tanya Z; Wood, Valerie; Drabkin, Harold; Fey, Petra; Garmiri, Penelope; Harris, Midori A; Sawford, Tony; Reiser, Leonore; Tauber, Rebecca; Toro, Sabrina.

Database (Oxford) ; 20192019 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-30715275

RESUMO

High-throughput studies constitute an essential and valued source of information for researchers. However, high-throughput experimental workflows are often complex, with multiple data sets that may contain large numbers of false positives. The representation of high-throughput data in the Gene Ontology (GO) therefore presents a challenging annotation problem, when the overarching goal of GO curation is to provide the most precise view of a gene's role in biology. To address this, representatives from annotation teams within the GO Consortium reviewed high-throughput data annotation practices. We present an annotation framework for high-throughput studies that will facilitate good standards in GO curation and, through the use of new high-throughput evidence codes, increase the visibility of these annotations to the research community.

Assuntos

Bases de Dados Genéticas , Ontologia Genética , Genômica/métodos , Anotação de Sequência Molecular/métodos , Animais , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA

6.

Scaling up data curation using deep learning: An application to literature triage in genomic variation resources.

Lee, Kyubum; Famiglietti, Maria Livia; McMahon, Aoife; Wei, Chih-Hsuan; MacArthur, Jacqueline Ann Langdon; Poux, Sylvain; Breuza, Lionel; Bridge, Alan; Cunningham, Fiona; Xenarios, Ioannis; Lu, Zhiyong.

PLoS Comput Biol ; 14(8): e1006390, 2018 08.

Artigo em Inglês | MEDLINE | ID: mdl-30102703

RESUMO

Manually curating biomedical knowledge from publications is necessary to build a knowledge based service that provides highly precise and organized information to users. The process of retrieving relevant publications for curation, which is also known as document triage, is usually carried out by querying and reading articles in PubMed. However, this query-based method often obtains unsatisfactory precision and recall on the retrieved results, and it is difficult to manually generate optimal queries. To address this, we propose a machine-learning assisted triage method. We collect previously curated publications from two databases UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog, and used them as a gold-standard dataset for training deep learning models based on convolutional neural networks. We then use the trained models to classify and rank new publications for curation. For evaluation, we apply our method to the real-world manual curation process of UniProtKB/Swiss-Prot and the GWAS Catalog. We demonstrate that our machine-assisted triage method outperforms the current query-based triage methods, improves efficiency, and enriches curated content. Our method achieves a precision 1.81 and 2.99 times higher than that obtained by the current query-based triage methods of UniProtKB/Swiss-Prot and the GWAS Catalog, respectively, without compromising recall. In fact, our method retrieves many additional relevant publications that the query-based method of UniProtKB/Swiss-Prot could not find. As these results show, our machine learning-based method can make the triage process more efficient and is being implemented in production so that human curators can focus on more challenging tasks to improve the quality of knowledge bases.

Assuntos

Curadoria de Dados/métodos , Armazenamento e Recuperação da Informação/métodos , Curadoria de Dados/estatística & dados numéricos , Bases de Dados Genéticas , Bases de Dados de Proteínas , Aprendizado Profundo , Genômica , Bases de Conhecimento , Aprendizado de Máquina , Publicações

7.

On expert curation and scalability: UniProtKB/Swiss-Prot as a case study.

Poux, Sylvain; Arighi, Cecilia N; Magrane, Michele; Bateman, Alex; Wei, Chih-Hsuan; Lu, Zhiyong; Boutet, Emmanuel; Bye-A-Jee, Hema; Famiglietti, Maria Livia; Roechert, Bernd; UniProt Consortium, The.

Bioinformatics ; 33(21): 3454-3460, 2017 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-29036270

RESUMO

MOTIVATION: Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multiple literature triage approaches. RESULTS: With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000-10 000 papers are curated in UniProt each year while curators evaluate 50 000-70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2-3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable. AVAILABILITY AND IMPLEMENTATION: UniProt is freely available at http://www.uniprot.org/. CONTACT: sylvain.poux@sib.swiss. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Curadoria de Dados , Bases de Dados de Proteínas , Curadoria de Dados/estatística & dados numéricos , Mineração de Dados , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Bases de Conhecimento , PubMed/estatística & dados numéricos , Literatura de Revisão como Assunto , Estatística como Assunto

8.

Bacterial Virus Ontology; Coordinating across Databases.

Hulo, Chantal; Masson, Patrick; Toussaint, Ariane; Osumi-Sutherland, David; de Castro, Edouard; Auchincloss, Andrea H; Poux, Sylvain; Bougueleret, Lydie; Xenarios, Ioannis; Le Mercier, Philippe.

Viruses ; 9(6)2017 05 23.

Artigo em Inglês | MEDLINE | ID: mdl-28545254

RESUMO

Bacterial viruses, also called bacteriophages, display a great genetic diversity and utilize unique processes for infecting and reproducing within a host cell. All these processes were investigated and indexed in the ViralZone knowledge base. To facilitate standardizing data, a simple ontology of viral life-cycle terms was developed to provide a common vocabulary for annotating data sets. New terminology was developed to address unique viral replication cycle processes, and existing terminology was modified and adapted. Classically, the viral life-cycle is described by schematic pictures. Using this ontology, it can be represented by a combination of successive events: entry, latency, transcription/replication, host-virus interactions and virus release. Each of these parts is broken down into discrete steps. For example enterobacteria phage lambda entry is broken down in: viral attachment to host adhesion receptor, viral attachment to host entry receptor, viral genome ejection and viral genome circularization. To demonstrate the utility of a standard ontology for virus biology, this work was completed by annotating virus data in the ViralZone, UniProtKB and Gene Ontology databases.

Assuntos

Bacteriófagos/genética , Bacteriófagos/fisiologia , Ontologias Biológicas , Bacteriófagos/classificação , Bacteriófagos/crescimento & desenvolvimento , Bases de Dados Factuais , Interações Hospedeiro-Patógeno , Terminologia como Assunto

9.

The ins and outs of eukaryotic viruses: Knowledge base and ontology of a viral infection.

Hulo, Chantal; Masson, Patrick; de Castro, Edouard; Auchincloss, Andrea H; Foulger, Rebecca; Poux, Sylvain; Lomax, Jane; Bougueleret, Lydie; Xenarios, Ioannis; Le Mercier, Philippe.

PLoS One ; 12(2): e0171746, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28207819

RESUMO

Viruses are genetically diverse, infect a wide range of tissues and host cells and follow unique processes for replicating themselves. All these processes were investigated and indexed in ViralZone knowledge base. To facilitate standardizing data, a simple ontology of viral life-cycle terms was developed to provide a common vocabulary for annotating data sets. New terminology was developed to address unique viral replication cycle processes, and existing terminology was modified and adapted. The virus life-cycle is classically described by schematic pictures. Using this ontology, it can be represented by a combination of successive terms: "entry", "latency", "transcription", "replication" and "exit". Each of these parts is broken down into discrete steps. For example Zika virus "entry" is broken down in successive steps: "Attachment", "Apoptotic mimicry", "Viral endocytosis/ macropinocytosis", "Fusion with host endosomal membrane", "Viral factory". To demonstrate the utility of a standard ontology for virus biology, this work was completed by annotating virus data in the ViralZone, UniProtKB and Gene Ontology databases.

Assuntos

Células Eucarióticas/virologia , Terminologia como Assunto , Viroses/virologia , Fenômenos Fisiológicos Virais , Bases de Dados Genéticas , Replicação Viral , Vírus/genética , Vírus/patogenicidade

10.

Best Practices in Manual Annotation with the Gene Ontology.

Poux, Sylvain; Gaudet, Pascale.

Methods Mol Biol ; 1446: 41-54, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-27812934

RESUMO

The Gene Ontology (GO) is a framework designed to represent biological knowledge about gene products' biological roles and the cellular location in which they act. Biocuration is a complex process: the body of scientific literature is large and selection of appropriate GO terms can be challenging. Both these issues are compounded by the fact that our understanding of biology is still incomplete; hence it is important to appreciate that GO is inherently an evolving model. In this chapter, we describe how biocurators create GO annotations from experimental findings from research articles. We describe the current best practices for high-quality literature curation and how GO curators succeed in modeling biology using a relatively simple framework. We also highlight a number of difficulties when translating experimental assays into GO annotations.

Assuntos

Biologia Computacional/métodos , Ontologia Genética , Anotação de Sequência Molecular/métodos , Animais , Bases de Dados de Proteínas , Humanos , Fenótipo , Proteínas/genética , Proteínas/metabolismo

11.

The UniProtKB guide to the human proteome.

Breuza, Lionel; Poux, Sylvain; Estreicher, Anne; Famiglietti, Maria Livia; Magrane, Michele; Tognolli, Michael; Bridge, Alan; Baratin, Delphine; Redaschi, Nicole.

Database (Oxford) ; 20162016.

Artigo em Inglês | MEDLINE | ID: mdl-26896845

RESUMO

Advances in high-throughput and advanced technologies allow researchers to routinely perform whole genome and proteome analysis. For this purpose, they need high-quality resources providing comprehensive gene and protein sets for their organisms of interest. Using the example of the human proteome, we will describe the content of a complete proteome in the UniProt Knowledgebase (UniProtKB). We will show how manual expert curation of UniProtKB/Swiss-Prot is complemented by expert-driven automatic annotation to build a comprehensive, high-quality and traceable resource. We will also illustrate how the complexity of the human proteome is captured and structured in UniProtKB. Database URL: www.uniprot.org.

Assuntos

Bases de Dados de Proteínas , Proteoma/genética , Proteômica/métodos , Automação , Genoma , Humanos , Bases de Conhecimento , Fenótipo , Processamento de Proteína Pós-Traducional , Proteínas/química , Edição de RNA , Software

12.

UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View.

Boutet, Emmanuel; Lieberherr, Damien; Tognolli, Michael; Schneider, Michel; Bansal, Parit; Bridge, Alan J; Poux, Sylvain; Bougueleret, Lydie; Xenarios, Ioannis.

Methods Mol Biol ; 1374: 23-54, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26519399

RESUMO

The Universal Protein Resource (UniProt, http://www.uniprot.org ) consortium is an initiative of the SIB Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB), updated every 4 weeks, and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc).The Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot) contains publicly available expertly manually annotated protein sequences obtained from a broad spectrum of organisms. Plant protein entries are produced in the frame of the Plant Proteome Annotation Program (PPAP), with an emphasis on characterized proteins of Arabidopsis thaliana and Oryza sativa. High level annotations provided by UniProtKB/Swiss-Prot are widely used to predict annotation of newly available proteins through automatic pipelines.The purpose of this chapter is to present a guided tour of a UniProtKB/Swiss-Prot entry. We will also present some of the tools and databases that are linked to each entry.

Assuntos

Biologia Computacional/métodos , Bases de Dados de Proteínas , Animais , Humanos , Navegador

13.

The Confidence Information Ontology: a step towards a standard for asserting confidence in annotations.

Bastian, Frederic B; Chibucos, Marcus C; Gaudet, Pascale; Giglio, Michelle; Holliday, Gemma L; Huang, Hong; Lewis, Suzanna E; Niknejad, Anne; Orchard, Sandra; Poux, Sylvain; Skunca, Nives; Robinson-Rechavi, Marc.

Database (Oxford) ; 2015: bav043, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25957950

RESUMO

Biocuration has become a cornerstone for analyses in biology, and to meet needs, the amount of annotations has considerably grown in recent years. However, the reliability of these annotations varies; it has thus become necessary to be able to assess the confidence in annotations. Although several resources already provide confidence information about the annotations that they produce, a standard way of providing such information has yet to be defined. This lack of standardization undermines the propagation of knowledge across resources, as well as the credibility of results from high-throughput analyses. Seeded at a workshop during the Biocuration 2012 conference, a working group has been created to address this problem. We present here the elements that were identified as essential for assessing confidence in annotations, as well as a draft ontology--the Confidence Information Ontology--to illustrate how the problems identified could be addressed. We hope that this effort will provide a home for discussing this major issue among the biocuration community. Tracker URL: https://github.com/BgeeDB/confidence-information-ontology Ontology URL: https://raw.githubusercontent.com/BgeeDB/confidence-information-ontology/master/src/ontology/cio-simple.obo

Assuntos

Ontologias Biológicas , Curadoria de Dados/normas , Congressos como Assunto

14.

HAMAP in 2015: updates to the protein family classification and annotation system.

Pedruzzi, Ivo; Rivoire, Catherine; Auchincloss, Andrea H; Coudert, Elisabeth; Keller, Guillaume; de Castro, Edouard; Baratin, Delphine; Cuche, Béatrice A; Bougueleret, Lydie; Poux, Sylvain; Redaschi, Nicole; Xenarios, Ioannis; Bridge, Alan.

Nucleic Acids Res ; 43(Database issue): D1064-70, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25348399

RESUMO

HAMAP (High-quality Automated and Manual Annotation of Proteins--available at http://hamap.expasy.org/) is a system for the automatic classification and annotation of protein sequences. HAMAP provides annotation of the same quality and detail as UniProtKB/Swiss-Prot, using manually curated profiles for protein sequence family classification and expert curated rules for functional annotation of family members. HAMAP data and tools are made available through our website and as part of the UniRule pipeline of UniProt, providing annotation for millions of unreviewed sequences of UniProtKB/TrEMBL. Here we report on the growth of HAMAP and updates to the HAMAP system since our last report in the NAR Database Issue of 2013. We continue to augment HAMAP with new family profiles and annotation rules as new protein families are characterized and annotated in UniProtKB/Swiss-Prot; the latest version of HAMAP (as of 3 September 2014) contains 1983 family classification profiles and 1998 annotation rules (up from 1780 and 1720). We demonstrate how the complex logic of HAMAP rules allows for precise annotation of individual functional variants within large homologous protein families. We also describe improvements to our web-based tool HAMAP-Scan which simplify the classification and annotation of sequences, and the incorporation of an improved sequence-profile search algorithm.

Assuntos

Bases de Dados de Proteínas , Anotação de Sequência Molecular , Homologia de Sequência de Aminoácidos , Humanos , Internet , Proteínas/classificação

15.

An integrated ontology resource to explore and study host-virus relationships.

Masson, Patrick; Hulo, Chantal; de Castro, Edouard; Foulger, Rebecca; Poux, Sylvain; Bridge, Alan; Lomax, Jane; Bougueleret, Lydie; Xenarios, Ioannis; Le Mercier, Philippe.

PLoS One ; 9(9): e108075, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25233094

RESUMO

Our growing knowledge of viruses reveals how these pathogens manage to evade innate host defenses. A global scheme emerges in which many viruses usurp key cellular defense mechanisms and often inhibit the same components of antiviral signaling. To accurately describe these processes, we have generated a comprehensive dictionary for eukaryotic host-virus interactions. This controlled vocabulary has been detailed in 57 ViralZone resource web pages which contain a global description of all molecular processes. In order to annotate viral gene products with this vocabulary, an ontology has been built in a hierarchy of UniProt Knowledgebase (UniProtKB) keyword terms and corresponding Gene Ontology (GO) terms have been developed in parallel. The results are 65 UniProtKB keywords related to 57 GO terms, which have been used in 14,390 manual annotations; 908,723 automatic annotations and propagated to an estimation of 922,941 GO annotations. ViralZone pages, UniProtKB keywords and GO terms provide complementary tools to users, and the three resources have been linked to each other through host-virus vocabulary.

Assuntos

Ontologia Genética , Interações Hospedeiro-Patógeno/genética , Imunidade Adaptativa/genética , Animais , Bases de Dados de Ácidos Nucleicos , Regulação da Expressão Gênica/imunologia , Humanos , Imunidade Inata , Interferons/genética , Viroses/genética , Viroses/imunologia , Viroses/virologia

16.

Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation.

Famiglietti, Maria Livia; Estreicher, Anne; Gos, Arnaud; Bolleman, Jerven; Géhant, Sébastien; Breuza, Lionel; Bridge, Alan; Poux, Sylvain; Redaschi, Nicole; Bougueleret, Lydie; Xenarios, Ioannis.

Hum Mutat ; 35(8): 927-35, 2014 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-24848695

RESUMO

During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype.

Assuntos

Bases de Dados de Proteínas/estatística & dados numéricos , Estudos de Associação Genética , Genética Médica , Bases de Conhecimento , Proteoma , Software , Sequência de Aminoácidos , Variação Genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Internet , Anotação de Sequência Molecular , Dados de Sequência Molecular , Terminologia como Assunto

17.

Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data.

Poux, Sylvain; Magrane, Michele; Arighi, Cecilia N; Bridge, Alan; O'Donovan, Claire; Laiho, Kati.

Database (Oxford) ; 2014: bau016, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24622611

RESUMO

UniProtKB/Swiss-Prot provides expert curation with information extracted from literature and curator-evaluated computational analysis. As knowledgebases continue to play an increasingly important role in scientific research, a number of studies have evaluated their accuracy and revealed various errors. While some are curation errors, others are the result of incorrect information published in the scientific literature. By taking the example of sirtuin-5, a complex annotation case, we will describe the curation procedure of UniProtKB/Swiss-Prot and detail how we report conflicting information in the database. We will demonstrate the importance of collaboration between resources to ensure curation consistency and the value of contributions from the user community in helping maintain error-free resources. Database URL: www.uniprot.org.

Assuntos

Mineração de Dados/métodos , Bases de Dados de Proteínas , Estatística como Assunto , Automação , Comportamento Cooperativo , Humanos , Anotação de Sequência Molecular

18.

HAMAP in 2013, new developments in the protein family classification and annotation system.

Pedruzzi, Ivo; Rivoire, Catherine; Auchincloss, Andrea H; Coudert, Elisabeth; Keller, Guillaume; de Castro, Edouard; Baratin, Delphine; Cuche, Béatrice A; Bougueleret, Lydie; Poux, Sylvain; Redaschi, Nicole; Xenarios, Ioannis; Bridge, Alan.

Nucleic Acids Res ; 41(Database issue): D584-9, 2013 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-23193261

RESUMO

HAMAP (High-quality Automated and Manual Annotation of Proteins-available at http://hamap.expasy.org/) is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated annotation rules that specify annotations that apply to family members. HAMAP was originally developed to support the manual curation of UniProtKB/Swiss-Prot records describing microbial proteins. Here we describe new developments in HAMAP, including the extension of HAMAP to eukaryotic proteins, the use of HAMAP in the automated annotation of UniProtKB/TrEMBL, providing high-quality annotation for millions of protein sequences, and the future integration of HAMAP into a unified system for UniProtKB annotation, UniRule. HAMAP is continuously updated by expert curators with new family profiles and annotation rules as new protein families are characterized. The collection of HAMAP family classification profiles and annotation rules can be browsed and viewed on the HAMAP website, which also provides an interface to scan user sequences against HAMAP profiles.

Assuntos

Bases de Dados de Proteínas , Anotação de Sequência Molecular , Proteínas/classificação , Eucariotos/genética , Internet

19.

UniProtKB amid the turmoil of plant proteomics research.

Schneider, Michel; Poux, Sylvain.

Front Plant Sci ; 3: 270, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-23230445

RESUMO

The UniProt KnowledgeBase (UniProtKB) provides a single, centralized, authoritative resource for protein sequences and functional information. The majority of its records is based on automatic translation of coding sequences (CDS) provided by submitters at the time of initial deposition to the nucleotide sequence databases (INSDC). This article will give a general overview of the current situation, with some specific illustrations extracted from our annotation of Arabidopsis and rice proteomes. More and more frequently, only the raw sequence of a complete genome is deposited to the nucleotide sequence databases and the gene model predictions and annotations are kept in separate, specialized model organism databases (MODs). In order to be able to provide the complete proteome of model organisms, UniProtKB had to implement pipelines for import of protein sequences from Ensembl and EnsemblGenomes. A single genome can be the target of several unrelated sequencing projects and the final assembly and gene model predictions may diverge quite significantly. In addition, several cultivars of the same species are often sequenced - 1001 Arabidopsis cultivars are currently under way - and the resulting proteomes are far from being identical. Therefore, one challenge for UniProtKB is to store and organize these data in a convenient way and to clearly defined reference proteomes that should be made available to users. Manual annotation is one of the landmarks of the Swiss-Prot section of UniProtKB. Besides adding functional annotation, curators are checking, and often correcting, gene model predictions. For plants, this task is limited to Arabidopsis thaliana and Oryza sativa subsp. japonica. Proteomics data providing experimental evidences confirming the existence of proteins or identifying sequence features such as post-translational modifications are also imported into UniProtKB records and the knowledgebase is cross-referenced to numerous proteomics resource.

20.

The UniProtKB/Swiss-Prot Tox-Prot program: A central hub of integrated venom protein data.

Jungo, Florence; Bougueleret, Lydie; Xenarios, Ioannis; Poux, Sylvain.

Toxicon ; 60(4): 551-7, 2012 Sep 15.

Artigo em Inglês | MEDLINE | ID: mdl-22465017

RESUMO

Animal toxins are of interest to a wide range of scientists, due to their numerous applications in pharmacology, neurology, hematology, medicine, and drug research. This, and to a lesser extent the development of new performing tools in transcriptomics and proteomics, has led to an increase in toxin discovery. In this context, providing publicly available data on animal toxins has become essential. The UniProtKB/Swiss-Prot Tox-Prot program (http://www.uniprot.org/program/Toxins) plays a crucial role by providing such an access to venom protein sequences and functions from all venomous species. This program has up to now curated more than 5000 venom proteins to the high-quality standards of UniProtKB/Swiss-Prot (release 2012_02). Proteins targeted by these toxins are also available in the knowledgebase. This paper describes in details the type of information provided by UniProtKB/Swiss-Prot for toxins, as well as the structured format of the knowledgebase.

Assuntos

Bases de Dados de Proteínas , Software , Peçonhas/química , Animais , Apresentação de Dados , Dados de Sequência Molecular , Conformação Proteica , Proteínas/química , Alinhamento de Sequência , Terminologia como Assunto

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA